{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Llama源码中的RoPE\n",
    "\n",
    "假设d_model=6，此时正在处理一个文本序列中的第二个token，即m=1。则有：\n",
    "$$q=(q_{0},q_{1},q_{2},q_{3},q_{4},q_{5}) \\\\ cos=(cos \\theta_{0},cos \\theta_{1},cos \\theta_{2},cos \\theta_{0},cos \\theta_{1},cos \\theta_{2}) \\\\ sin=(sin \\theta_{0},sin \\theta_{1},sin \\theta_{2},sin \\theta_{0},sin \\theta_{1},sin \\theta_{2}) \\\\ rotateHalf(q)=(-q_{3},-q_{4},-q_{5},q_{0},q_{1},q_{2})$$\n",
    "\n",
    "于是，对q的RoPE嵌入计算为：\n",
    "\n",
    "$$\n",
    "qEmbed=\n",
    "\\begin{pmatrix}\n",
    "q_{0} \\\\\n",
    "q_{1} \\\\\n",
    "q_{2} \\\\\n",
    "q_{3} \\\\\n",
    "q_{4} \\\\\n",
    "q_{5}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "cos \\theta_{0} \\\\\n",
    "cos \\theta_{1} \\\\\n",
    "cos \\theta_{2} \\\\\n",
    "cos \\theta_{0} \\\\\n",
    "cos \\theta_{1} \\\\\n",
    "cos \\theta_{2}\n",
    "\\end{pmatrix}\n",
    "+\n",
    "\\begin{pmatrix}\n",
    "-q_{3} \\\\\n",
    "-q_{4} \\\\\n",
    "-q_{5} \\\\\n",
    "q_{0} \\\\\n",
    "q_{1} \\\\\n",
    "q_{2}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "sin \\theta_{0} \\\\\n",
    "sin \\theta_{1} \\\\\n",
    "sin \\theta_{2} \\\\\n",
    "sin \\theta_{0} \\\\\n",
    "sin \\theta_{1} \\\\\n",
    "sin \\theta_{2}\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "\n",
    "上式可以写成：\n",
    "$$\n",
    "qEmbed=\n",
    "\\begin{pmatrix}\n",
    "q_{0} \\\\\n",
    "q_{3} \\\\\n",
    "q_{1} \\\\\n",
    "q_{4} \\\\\n",
    "q_{2} \\\\\n",
    "q_{5}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "cos \\theta_{0} \\\\\n",
    "cos \\theta_{0} \\\\\n",
    "cos \\theta_{1} \\\\\n",
    "cos \\theta_{1} \\\\\n",
    "cos \\theta_{2} \\\\\n",
    "cos \\theta_{2}\n",
    "\\end{pmatrix}\n",
    "+\n",
    "\\begin{pmatrix}\n",
    "-q_{3} \\\\\n",
    "q_{0} \\\\\n",
    "-q_{4} \\\\\n",
    "q_{1} \\\\\n",
    "-q_{5} \\\\\n",
    "q_{2}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "sin \\theta_{0} \\\\\n",
    "sin \\theta_{0} \\\\\n",
    "sin \\theta_{1} \\\\\n",
    "sin \\theta_{1} \\\\\n",
    "sin \\theta_{2} \\\\\n",
    "sin \\theta_{2}\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "\n",
    "推广到一般情况有：\n",
    "\n",
    "$$\n",
    "qEmbed=\n",
    "\\begin{pmatrix}\n",
    "q_{0} \\\\\n",
    "q_{d/2} \\\\\n",
    "q_{1} \\\\\n",
    "q_{d/2+1} \\\\\n",
    "...  \\\\\n",
    "q_{d/2-1} \\\\\n",
    "q_{d-1}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "cos m\\theta_{0} \\\\\n",
    "cos m\\theta_{0} \\\\\n",
    "cos m\\theta_{1} \\\\\n",
    "cos m\\theta_{1} \\\\\n",
    "... \\\\\n",
    "cos m\\theta_{d/2-1} \\\\\n",
    "cos \\theta_{d/2-1}\n",
    "\\end{pmatrix}\n",
    "+\n",
    "\\begin{pmatrix}\n",
    "-q_{d/2} \\\\\n",
    "q_{0} \\\\\n",
    "-q_{d/2+1} \\\\\n",
    "q_{1} \\\\\n",
    "... \\\\\n",
    "-q_{d-1} \\\\\n",
    "q_{d/2-1}\n",
    "\\end{pmatrix}\n",
    "\\otimes\n",
    "\\begin{pmatrix}\n",
    "sin m\\theta_{0} \\\\\n",
    "sin m\\theta_{0} \\\\\n",
    "sin m\\theta_{1} \\\\\n",
    "sin m\\theta_{1} \\\\\n",
    "... \\\\\n",
    "sin m\\theta_{d/2-1} \\\\\n",
    "sin \\theta_{d/2-1}\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "\n",
    "所以不同元素的相对位置一致，尽管与原始论文不一样，但依然可以达到位置嵌入的目的。\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
